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Abstract 

The classic x 2 statistic for testing goodness-of-fit has long been a cornerstone of modern 
statistical practice. The statistic consists of a sum in which each summand involves division 
by the probability associated with the corresponding bin in the distribution being tested for 
goodness-of-fit. Typically this division should precipitate rebinning to uniformize the proba- 
bilities associated with the bins, in order to make the test reasonably powerful. With the now 
widespread availability of computers, there is no longer any need for this. The present paper 
provides efficient black-box algorithms for calculating the asymptotic confidence levels of a 
variant on the classic x 2 test which omits the problematic division. In many circumstances, 
it is also feasible to compute the exact confidence levels via Monte Carlo simulation. 
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1. Introduction 

A basic task in statistics is to ascertain whether a given set of independent and identi- 
cally distributed (i.i.d.) draws does not come from a specified probability distribution (this 
specified distribution is known as the "model"). In the present paper, we consider the case 
in which the draws are discrete random variables, taking values in a finite set. In accordance 
with the standard terminology, we will refer to the possible values of the discrete random 
variables as "bins" ("categories," "cells," and "classes" are common synonyms for "bins"). 

A natural approach to ascertaining whether the i.i.d. draws do not come from the specified 
probability distribution uses a root-mean-square statistic. To construct this statistic, we 
estimate the probability distribution over the bins using the given i.i.d. draws, and then 
measure the root-mean-square difference between this empirical distribution and the specified 
model distribution; see, for example, [lj, page 123 of [2j], or Section [2] below. If the draws 
do in fact arise from the specified model, then with high probability this root-mean-square 
is not large. Thus, if the root-mean-square statistic is large, then we can be confident that 
the draws do not arise from the specified probability distribution. 

Let us denote by x rms the value of the root-mean-square for the given i.i.d. draws; let 
us denote by X rms the root-mean-square statistic constructed for different i.i.d. draws that 
definitely do in fact come from the specified model distribution. Then, the significance level 
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a is defined to be the probability that X rms > x rms (viewing X rms — but not x rms — as 
a random variable). The confidence level that the given i.i.d. draws do not arise from the 
specified model distribution is the complement of the significance level, namely 1 — a. 

Unfortunately, the confidence levels for the simple root-mean-square statistic are different 
for different model probability distributions. To avoid this seeming inconvenience (at least 
asymptotically), one may weight the average in the root-mean-square by the inverses of the 
model probabilities associated with the various bins, obtaining the classic x 2 statistic; see, 
for example, [3J or Remark 12.11 below. However, with the now widespread availability of 
computers, direct use of the root-mean-square statistic has become feasible (and actually 
turns out to be very convenient). The present paper provides efficient black-box algorithms 
for computing the confidence levels for any specified model distribution, in the limit of large 
numbers of draws. Calculating confidence levels for small numbers of draws via Monte Carlo 
can also be practical. 

The simple statistic described above would seem to be more natural than the standard 
X 2 statistic of j3[, is typically easier to use (since it does not require any rebinning of data), 
and is more powerful in many circumstances, as we demonstrate both in Section [6] below 
and more extensively in a forthcoming paper. Even more powerful is the combination of 
the root-mean-square statistic and an asymptotically equivalent variation of the x 2 statistic, 
such as the (log) likelihood-ratio or "G 2 " statistic; the (log) likelihood-ratio and x 2 statistics 
are asymptotically equivalent when the draws arise from the model, while the (log) likelihood- 
ratio can be more powerful than x 2 f° r small numbers of draws (see, for example, [if). The 
rest of the present article has the following structure: Section [2] details the statistic discussed 
above, expressing the confidence levels for the associated goodness-of-fit test in a form suit- 
able for computation. Section [3] discusses the most involved part of the computation of the 
confidence levels, computing the cumulative distribution function of the sum of the squares of 
independent centered Gaussian random variables. Section H] summarizes the method for com- 
puting the confidence levels of the root-mean-square statistic. Section [5] applies the method 
to several examples. Section [H] very briefly illustrates the power of the root-mean-square. 
Section [7] draws some conclusions and proposes directions for further research. 

2. The simple statistic 

This section details the root-mean-square statistic discussed briefly in Section [TJ and 
determines its probability distribution in the limit of large numbers of draws, assuming that 
the draws do in fact come from the specified model. The distribution determined in this 
section yields the confidence levels (in the limit of large numbers of draws): Given a value x 
for the root-mean-square statistic constructed from i.i.d. draws coming from an unknown 
distribution, the confidence level that the draws do not come from the specified model is the 
probability that the root-mean-square statistic is less than x when constructed from i.i.d. 
draws that do come from the model distribution. 

To begin, we set notation and form the statistic X to be analyzed. Given n bins, num- 
bered 1, 2, . . . , n — 1, n, we denote by p±, P2, ■ ■ ■ , p n -i, Pn the probabilities associated with 
the respective bins under the specified model; of course, Ylk=iPk = 1- To obtain a draw 
conforming to the model, we select at random one of the n bins, with probabilities p±, P2, 
. . . , p n -i, p n - We perform this selection independently m times. For k = 1, 2, . . . , n — 1, n, 
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we denote by the fraction of times that we choose bin k (that is, Y\~ is the number of 
times that we choose bin k, divided by m); obviously, Y^l=i = 1- We define X k to be y/m 
times the difference of Y k from its expected value, that is, 

X k = yffi(Yk-Pk) (1) 

for k — 1, 2, . . . , n — 1, n. Finally, we form the statistic 

n 

* = ( 2 ) 
fe=i 

and now determine its distribution in the limit of large m. (X is the square of the root- 
mean-square statistic ^^^ =1 (mYfc — mp k ) 2 /m. Since the square root is a monotonically 
increasing function, the confidence levels are the same whether determined via X or via 
\/X\ for convenience, we focus on X below.) 



Remark 2.1. The classic y 2 test f° r goodness-of-fit of 0] replaces (jSj) with the statistic 

fe=i 

where X 1; X 2 , . . . , X n _i, X n are the same as in (pQ) and (j2j). x 2 defined in ([3]) has the 
advantage that its confidence levels are the same for every model distribution, independent 
of the values of pi, p 2 , ■ ■ ■ , p n -i, Pn, in the limit of large numbers of draws. In contrast, using 
X defined in ([2]) requires computing its confidence levels anew for every different model. 

The multivariate central limit theorem shows that the joint distribution of X±, X 2 , . . . , 
X n converges in distribution as m — > oo, with the limiting generalized probability 
density proportional to 

' ±&) 'it*)' w 



cxp 



where S is the Dirac delta; see, for example, [4] or Chapter 25 and Example 15.3 of [5j. 
The generalized probability density (jlj) is a centered multivariate Gaussian concentrated on 
a hyperplane passing through the origin (the hyperplane consists of the points such that 
YHi=i x k — 0); the restriction of the generalized probability density (jlj) to the hyperplane 
through the origin is also a centered multivariate Gaussian. Thus, the distribution of X 
defined in (j5j) converges as m — > oo to the distribution of the sum of the squares of n — 1 
independent Gaussian random variables of mean zero whose variances are the variances of 
the restricted multivariate Gaussian distribution along its principal axes; see, for example, 
0] or Chapter 25 of Given these variances, the following section describes an efficient 
algorithm for computing the probability that the associated sum of squares is less than any 
particular value; this probability is the desired confidence level, in the limit of large numbers 
of draws. See Sections @] and [5] for further details. 

To compute the variances of the restricted multivariate Gaussian distribution along its 
principal axes, we multiply the diagonal matrix D whose diagonal entries are l/pi, l/p2, ■ ■ ■ , 
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l/p n _i, l/p n from both the left and the right by the projection matrix P whose entries are 

for j, k = 1, 2, . . . , n — 1, n (upon application to a vector, P projects onto the orthogonal 
complement of the subspace consisting of every vector whose entries are all the same). The 
entries of this product B = PDP are 

f -- i f- + -)+ J 2Eri- ) 3=k 
Bj k= J Pk n U p k J n [ l = lp [ (6) 

for j, k = 1, 2, . . . , n — 1, n. Clearly, B is self-adjoint. By construction, exactly one of the 
eigenvalues of B is 0. The other eigenvalues of B are the multiplicative inverses of the desired 
variances of the restricted multivariate Gaussian distribution along its principal axes. 

Remark 2.2. The n x n matrix B defined in ([6]) is the sum of a diagonal matrix and a 
low-rank matrix. The methods of 0, 0] for computing the eigenvalues of such a matrix B 
require only either 0(n 2 ) or 0(n) floating-point operations. The 0(n 2 ) methods of @, 0] are 
usually more efficient than the 0(n) method of [7|, unless n is impractically large. 

Remark 2.3. It is not hard to accommodate homogeneous linear constraints of the form 
Ylk=i c k x k = (where ci, c 2 , . . . , c n _i, c n are real numbers) in addition to the requirement 
that Ylk=i Xk = 0- Accounting for any additional constraints is entirely analogous to the 
procedure detailed above for the particular constraint that ^2^ =1 Xk = 0. The estimation of 
parameters from the data in order to specify the model can impose such extra homogeneous 
linear constraints; see, for example, Chapter 25 of [5]. A detailed treatment is available 
in 1. 



3. The sum of the squares of independent centered Gaussian random variables 

This section describes efficient algorithms for evaluating the cumulative distribution func- 
tion (cdf ) of the sum of the squares of independent centered Gaussian random variables. The 
principal tool is the following theorem, expressing the cdf as an integral suitable for evalua- 
tion via quadratures (see, for example, Remark [3.41 below) . 

Theorem 3.1. Suppose thatn is a positive integer, X\, X-i, . . . , X n _i, X n are i.i.d. Gaussian 
random variables of zero mean and unit variance, and o\, oi, . . . , <r n -i, cr n are positive real 
numbers. Suppose in addition that X is the random variable 

n 

X = Y,WkX k \ 2 . (7) 

fc=i 

Then, the cumulative distribution function (cdf) P of X is 

poo / 1-i its/n \ 

P(x) = Im dt (8) 

Jo \n(t- ^) nLi A/1 " 2(t - iWJx + 2Uo*^i/xJ 

for any positive real number x, and P(x) = for any nonpositive real number x. The square 
roots in denote the principal branch, and Im takes the imaginary part. 
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Proof. For any k = 1, 2, . . . , n — 1, n, the characteristic function of |Xfc| is 

using the principal branch of the square root. By the independence of Xi, X 2 , . . . , X n _i, X n , 
the characteristic function of the random variable X defined in (J7|) is therefore 

n 1 

^'B^' iCVi-^ - (10) 

The probability density function of X is therefore 

i roc -r /-oo —itx 

V{x) = =- / e"* ip(t) dt = - — 7—=^ dt (11) 

for any real number x, and the cdf of X is 

J (y)dy -2 + T^L tm^-^i dt (12) 

for any real number x, where PV denotes the principal value. 

It follows from the fact that X is almost surely positive that the cdf P(x) is identically 
zero for x < 0; there is no need to calculate the cdf for x < 0. Substituting t h-> t/sc in (TI2"]) 
yields that the cdf is 

P( X ) = - + — PV / ; dt (13) 

for any positive real number x, where again PV denotes the principal value. The branch 
cuts for the integrand in ( JTBl are all on the lower half of the imaginary axis. 

Though the integration in ( fTBT) is along (—00,00), we may shift contours and instead 
integrate along the rays 

{{-y/n - i)t + i : * e (0,oo)} (14) 

and 

{(y/n-i)t + i : t E (0,oo)}, (15) 

obtaining from ( IT3l) that 



P(x) = — 



,1—t p —itsph 



2W0 V (* - i+br) nLi >A - 2(t - iWJx - mai^i/x 



pl—t pitsfn \ 

dt (16) 



it ~ j+p) nLi Vl-2(t-l)^/x + 2tta2Vn/. 
for any positive real number x. Combining ( fT6|) and the definition of "Im" yields (JHJ). □ 
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Remark 3.2. We chose the contours (1T41) and (JT5I) so that the absolute value of the expres- 
sion under the square root in (jHJ) is greater than yjn/(n + 1). Therefore, 



n v 1 - 2 ^ - i >"/ x + ^iv^/x 



k=i 



> 



71 



77 



n/4 



> 



,1/4 



(17) 



for any t G (0, oo) and any x G (0, oo). Thus, the integrand in (jHJ) is never large for £ G (0, oo). 

Remark 3.3. The integrand in (JHJ) decays exponentially fast, at a rate independent of the 
values of a±, 02, . . . , u n -i, &n, and x (see the preceding remark). 

Remark 3.4. An efficient means of evaluating (jHJ) numerically is to employ adaptive Gaus- 
sian quadratures; see, for example, Section 4.7 of [9]. To attain double-precision accuracy 
(roughly 15-digit precision), the domain of integration for t in (JHJ) need be only (0, 40) rather 
than the whole (0, oo). Good choices for the lowest orders of the quadratures used in the 
adaptive Gaussian quadratures are 10 and 21, for double-precision accuracy. 



Remark 3.5. For a similar, more general approach, see [lOj. For alternative approaches, 
sec [11]. Unlike these alternatives, the approach of the present section has an upper bound 
on its required number of floating-point operations that depends only on the number n of 
bins and on the precision e of computations, not on the values of a%, (r 2 , . . . , <J n -i, &n, 
or x. Indeed, it is easy to see that the numerical evaluation of (jHJ) theoretically requires 
0(n In 2 (y/n / e)) quadrature nodes: the denominator of the integrand in (jHJ) cannot oscillate 
more than n + 1 times (once for each "pole") as t ranges from to oo, while the numerator 
of the integrand cannot oscillate more than yfn \n(2y/n/e) times as t ranges from to 
ln(2y/n/e); furthermore, the domain of integration for t in (jHJ) need be only (0, \n(2\/n/e)) 
rather than the whole (0, oo). In practice, using several hundred quadrature nodes produces 
double-precision accuracy (roughly 15-digit precision); see, for example, Section [5] below. 
Also, the observed performance is similar when subtracting the imaginary unit i from the 
contours (fl4"l) and (Pi~5l). 



4. A procedure for computing the confidence levels 

An efficient method for calculating the confidence levels in the limit of large numbers of 
draws proceeds as follows. Given i.i.d. draws from any distribution — not necessarily from 
the specified model — we can form the associated statistic X defined in (j2j) and (jTJ); in the 
limit of large numbers of draws, the confidence level that the draws do not arise from the 
model is then just the cumulative distribution function P in (jHJ) evaluated at x — X, with 
a\ in (jHJ) being the inverses of the positive eigenvalues of the matrix B defined in (jHJ) — after 
all, P(x) is then the probability that x is greater than the sum of the squares of independent 
centered Gaussian random variables whose variances are the multiplicative inverses of the 
positive eigenvalues of B. Remark [33] above describes an efficient means of evaluating P{x) 
numerically. 
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5. Numerical examples 

This section illustrates the performance of the algorithm of Section HI via several numer- 
ical examples. 

Below, we plot the complementary cumulative distribution function of the square of the 
root-mean-square statistic whose probability distribution is determined in Section [2j in the 
limit of large numbers of draws. This is the distribution of the statistic X defined in (j2J) when 
the i.i.d. draws used to form X come from the same model distribution pi, p 2 , . . . , p n -i, Pn 
used in JTJ for defining X. In order to evaluate the cumulative distribution function (cdf) P, 
we apply adaptive Gaussian quadratures to the integral in ([S]) as described in Section [3j 
obtaining in (JS} via the algorithm described in Section 

In applications to goodness-of-fit testing, if the statistic X from (|2J) takes on a value x, 
then the confidence level that the draws do not arise from the model distribution is the 
cdf P in (JSJ) evaluated at x; the significance level that the draws do not arise from the model 
distribution is therefore 1 — P(x). Figures 1 and 2 plot the significance level (1 — P(x)) 
versus x for six example model distributions (examples a, b, c, d, e, f). Table 3 provides 
formulae for the model distributions used in the six examples. Tables 1 and 2 summarize 
the computational costs required to attain at least 9-digit absolute accuracy for the plots 
in Figures 1 and 2, respectively. Each plot displays 1 — P(x) at 100 values for x. Figure 2 
focuses on the tails of the distributions, corresponding to suitably high confidence levels. 

The following list describes the headings of the tables: 

• n is the number of bins/categories/cells/classes in Section 2 (pi, p 2 , . . . , p n -i, Pn are the 
probabilities of drawing the corresponding bins under the specified model distribution). 

• I is the maximum number of quadrature nodes required in any of the 100 evaluations 
of 1 — P(x) displayed in each plot of Figures 1 and 2. 

• t is the total number of seconds required to perform the quadratures for all 100 evalu- 
ations of 1 — P(x) displayed in each plot of Figures 1 and 2. 

• pk is the probability associated with bin k (k = 1, 2, . . . , n — 1, n) in Section |2j The 
constants C( a ), C(t>), C( c )> C(d), C( e )> C(f) m Table |3] are the positive real numbers chosen 
such that Y^k=\Pk = 1- F° r an y rea l number x, the floor |_^J is the greatest integer 
less than or equal to x; the probability distributions for examples (c) and (d) involve 
the floor. 

We used Fortran 77 and ran all examples on one core of a 2.2 GHz Intel Core 2 Duo mi- 
croprocessor with 2 MB of L2 cache. Our code is compliant with the IEEE double-precision 
standard (so that the mantissas of variables have approximately one bit of precision less than 
16 digits, yielding a relative precision of about 2E-16). We diagonalized the matrix B defined 
in flS]) using the Jacobi algorithm (see, for example, Chapter 8 of [12]), not taking advantage 
of Remark |2.2( explicitly forming the entries of the matrix B defined in can incur a numer- 
ical error of at most the machine precision (about 2E-16) times maxi<fc< n £>fc/ mhijxfc^pfc, 
yielding 9-digit accuracy or better for all our examples. A future article will exploit the 
interlacing properties of eigenvalues, as in p], to obtain higher precision. Of course, even 
5-digit precision would suffice for most statistical applications; however, modern computers 
can produce high accuracy very fast, as the examples in this section illustrate. 
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Fig. 1: The vertical axis is 1 — P(x) from (181) ; the horizontal axis is x. 
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Table 1: Values for Figure 1 





n 


I 




(a) 


500 


310 


5.0 


(b) 


250 


270 


2.4 


(c) 


100 


250 


0.9 


(d) 


50 


250 


0.5 


(e) 


25 


330 


0.3 


(0 


10 


270 


0.1 



Table 2: Values for Figure 2 





n 


/ 


t 


(a) 


500 


310 


5.7 


(b) 


250 


330 


3.0 


(c) 


100 


270 


1.0 


(d) 


50 


290 


0.6 


(e) 


25 


350 


0.4 


(0 


10 


270 


0.2 



Table 3: Values for both Figure 1 and Figure 2 





n 


Pk 


(a) 


500 


C (a) • (300 + k)' 2 


(b) 


250 


C (b) • (260 - A;) 3 


(c) 


100 


C {c) - L^O + AO/^J-Ve 


(d) 


50 


C (d) - (1/2 + lnL(61 - A;)/10J) 


(e) 


25 


C (e) • exp(-5A;/8) 


(f) 


10 


q f) -exp(-(A;-l) 2 /6) 
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6. The power of the root-mean-square 

This section very briefly compares the statistic defined in (jSJ) and the classic x 2 statistic 
defined in ([3]). This abbreviated comparison is in no way complete; a much more compre- 
hensive treatment constitutes a forthcoming article. 

We will discuss four statistics in all — the root-mean-square, x 2 , the (log)likelihood-ratio, 
and the Freeman- Tukey or Hellinger distance. We use pi, p 2 , ■ ■ ■ , p n -i, Pn to denote the 
expected fractions of the m i.i.d. draws falling in each of the n bins, and Yi, Y 2 , . . . , Y n -i, Y n 
to denote the observed fractions of the m draws falling in the n bins. That is, pi, P2, ■ ■ ■ , 
Pn-i, Pn are the probabilities associated with the n bins in the model distribution, whereas 
Yi, Y 2 , . . . , Y n _ 1; Y n are the fractions of the m draws falling in the n bins when we take the 
draws from a certain "actual" distribution that may differ from the model. 

With this notation, the square of the root-mean-square statistic is 

n 

X = mY,{Yu-Pk) 2 . (18) 
fc=i 

We use the designation "root-mean-square" to label the lines associated with X in the plots 
below. 

The classic Pearson x 2 statistic is 



X = m l^ " • ( 19 ) 

We use the standard designation "x 2 " to label the lines associated with x 2 m the plots below. 
The (log) likelihood-ratio or "G 2 " statistic is 

G 2 = 2mVy fc lnf^V (20) 

under the convention that ln(Yfe/pfc) = if Y^ = 0. We use the common designation lL G 2 " 
to label the lines associated with G 2 in the plots below. 
The Freeman- Tukey or Hellinger-distance statistic is 

n 

iJ 2 = 4m^(v / n- VP~k) 2 - (21) 

k=l 

We use the well-known designation "Freeman- Tukey" to label the lines associated with H 2 
in the plots below. 

In the limit that the number m of draws is large, the distributions of \ 2 defined in f)19p . 
G 2 defined in (1201) . and H 2 defined (1211) are all the same when the actual distribution of 
the draws is identical to the model (see, for example, [lj]). However, when the number m 
of draws is not large, then their distributions can differ substantially. In this section, we 
compute confidence levels via Monte Carlo simulations, without relying on the number m of 
draws to be large. 
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Remark 6.1. Below, we say that a statistic based on given i.i.d. draws "distinguishes" 
the actual distribution of the draws from the model distribution to mean that the computed 
confidence level is at least 99% for 99% of 40,000 simulations, with each simulation generating 
m i.i.d. draws according to the actual distribution. We computed the confidence levels 
by conducting 40,000 simulations, each generating m i.i.d. draws according to the model 
distribution. 

6.1. First example 

Let us first specify the model distribution to be 

Pi = \, (22) 

P2 = \, (23) 

Pk = w^~ A (24) 
2n — 4 

for h — 3, 4, . . . , n — 1, n. We consider m i.i.d. draws from the distribution 

Pi = §, (25) 

P2 = ~ (26) 
Pk = Pk (27) 



for k — 3, 4, . . . , n — 1, n, where p 3 , p 4 , . . . , p n -i, p n are the same as in 

Figure 3 plots the percentage of 40,000 simulations, each generating 200 i.i.d. draws ac- 
cording to the actual distribution defined in (|2"5"l) - (l2"Tjh that are successfully detected as not 
arising from the model distribution at the 1% significance level (meaning that the associated 
statistic for the simulation yields a confidence level of 99% or greater). We computed the sig- 
nificance levels by conducting 40,000 simulations, each generating 200 i.i.d. draws according 
to the model distribution defined in (j2"2"|) - (j24p . Figure 3 shows that the root-mean-square 
is successful in at least 99% of the simulations, while the classic \ 2 statistic fails often, 
succeeding in only 81% of the simulations for n = 16, and less than 5% for n > 256. 

Figure 4 plots the number m of draws required to distinguish the actual distribution 
defined in (|25|) - (|27p from the model distribution defined in f[2"2"j) - ([2~4"|) . Remark 16.11 above 
specifies what we mean by "distinguish." Figure 4 shows that the root-mean-square requires 
only about m = 185 draws for any number n of bins, while the classic \ 2 statistic requires 
90% more draws for n = 16, and greater than 300% more for n > 128. Furthermore, the 
classic x 2 statistic requires increasingly many draws as the number n of bins increases, unlike 
the root-mean-square. 
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8 16 32 64 128 256 512 
number (n) of bins 

Fig. 4: First example (statistical "efficiency"); see Subsection I6.ll 
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Freeman-Tukey 



root-mean-square 



Fig. 



32 64 1 28 
number (n) of bins 

5: Second example; see Subsection I6.2I 



256 



512 



6.2. Second example 

Next, let us specify the model distribution to be 

Ci 



Pk = T 



for k = 1, 2, 



n — 1, n, where 



We consider m i.i.d. draws from the distribution 



A- 2 



for 



1, 2, 



n — 1, n, where 



C 2 



(28) 



(29) 



(30) 



(31) 



Figure 5 plots the number m of draws required to distinguish the actual distribution 
defined in (130]) and (!3~T!) from the model distribution defined in (128]) and (|29|) . Remark 16.11 
above specifies what we mean by "distinguish." Figure 5 shows that the classic x 2 statistic 
requires increasingly many draws as the number n of bins increases, while the root-mean- 
square exhibits the opposite behavior. 
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8 16 32 64 128 256 512 

number (n) of bins 



Fig. 6: Third example; see Subsection I6.3I 



6.3. Third example 

Let us again specify the model distribution to be 



k 

for k — 1, 2, . . . , n — 1, n, where 



Pk = ^ (32) 



°' ~ EL^TP (33) 



We now consider m i.i.d. draws from the distribution 

Cl/2 



ft - 7f (34 > 



for /c = 1, 2, . . . , n — 1, n, where 



c ^u^m- (35) 

Figure 6 plots the number m of draws required to distinguish the actual distribution 
defined in and (1551) from the model distribution defined in (152"|) and Remark [6.11 

above specifies what we mean by "distinguish." The root-mean-square is not uniformly more 
powerful than the other statistics in this example; see Remark 16.21 below. 
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Fig. 7: Fourth example; see Subsection I6.4I 



6.4- Fourth example 

We turn now to models involving parameter estimation (for details, see Q]). Let us 
specify the model distribution to be the Zipf distribution 



Pk(<>) = § (36) 



° e = ESW 1 (37) 



for k = 1, 2, ... , 99, 100, where 



we estimate the parameter 9 via maximum-likelihood methods (see jg|). We consider m i.i.d. 
draws from the (truncated) geometric distribution 



Pk = c t t k (3* 



for k = 1, 2, ... , 99, 100, where 



Ct ~ v^lOO ,u ' (^) 



1 

Figure 7 considers several values for t. 

Figure 7 plots the number m of draws required to distinguish the actual distribution 
defined in (|38|) and (1391) from the model distribution defined in (1561) and (157]) . estimating the 
parameter 9 in (1361) and (1371) via maximum-likelihood methods. Remark 16. II above specifies 
what we mean by "distinguish." 
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6.5. Fifth example 

The model for our final example involves parameter estimation, too (for details, see [sj]). 
Let us specify the model distribution to be 

p k {6) = 9 k -\l-e) (40) 

for k = 1, 2, ... , 98, 99, and 

Pioo(f) = 0"; (41) 

we estimate the parameter 9 via maximum-likelihood methods (see (§]). We consider m i.i.d. 
draws from the Zipf distribution 

P* = ^ (42) 



k 



for k = 1, 2, ... , 99, 100, where 



^ = (43) 

Figure 8 considers several values for t. 

Figure 8 plots the number m of draws required to distinguish the actual distribution 
defined in (142 j) and (14*31) from the model distribution defined in (T4U1) and (14TT) . estimating the 
parameter ^ in (j4*0|) and (|41j) via maximum-likelihood methods. Remark 16. II above specifies 
what we mean by "distinguish." 

Remark 6.2. The root-mean-square statistic is not very sensitive to relative discrepancies 
between the model and actual distributions in bins whose associated model probabilities are 
small. When sensitivity in these bins is desirable, we recommend using both the root-mean- 
square statistic defined in (j2J) and an asymptotically equivalent variation of x 2 defined in (J3]), 
such as the (log)likelihood-ratio or "G 2 " test; see, for example, [l|. 
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7. Conclusions and generalizations 



This paper provides efficient black-box algorithms for computing the confidence levels 
for one of the most natural goodness-of-fit statistics, in the limit of large numbers of draws. 
As mentioned briefly above (in Remark 12.31) . our methods can handle model distributions 
specified via the multinomial maximum-likelihood estimation of parameters from the data; 
for details, see Moreover, we can handle model distributions with infinitely many bins; 
for details, see Observation 1 in Section 4 of [8]. Furthermore, we can handle arbitrarily 
weighted means in the root-mean-square, in addition to the usual, uniformly weighted average 
considered above. Finally, combining our methods and the statistical bootstrap should 
produce a test for whether two separate sets of draws arise from the same or from different 
distributions, when each set is taken i.i.d. from some (unspecified) distribution associated 
with the set (see, for example, [131]). 

The natural statistic has many advantages over more standard x 2 tests, as forthcoming 
papers will demonstrate. The classic x 2 statistic for goodness-of-fit, and especially variations 
such as the (log)likelihood-ratio, "G 2 ," and power- divergence statistics (see (lj), can be sen- 
sible supplements, but are not good alternatives when used alone. With the now widespread 
availability of computers, calculating significance levels via Monte Carlo simulations for the 
more natural statistic of the present article can be feasible; the algorithms of the present 
paper can also be suitable, and are efficient and easy-to-use. 
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